我的 Docker 卡死了,怎么办?在线等 |
您所在的位置:网站首页 › linux ps 进程 hang › 我的 Docker 卡死了,怎么办?在线等 |
最近升级了一版 kubelet,修复因 kubelet 删除 Pod 慢导致平台删除集群超时的问题。在灰度 redis 隔离集群的时候,发现升级 kubelet 并重启服务后,少量宿主状态变成了 NotReady,并且回滚 kubelet 至之前版本,宿主状态仍然是 NotReady。查看宿主状态时提示 ‘container runtime is down’ ,根据经验,此时一般就是容器运行时出了问题。弹性云使用的容器运行时是 docker,我们就去检查 docker 的状态,检测结果如下: docker ps 查看所有容器状态,执行正常 docker inspect 查看某一容器详细状态,执行阻塞 典型的 docker hang 死行为。因为我们最近在升级 docker 版本,存量宿主 docker 的版本为 1.13.1,并且在逐步升级至 18.06.3,新宿主的 docker 版本都是 18.06.3。docker hang 死问题在 1.13.1 版本上表现得更彻底,在执行 docker ps 的时候就已经 hang 死了,一旦某个容器出了问题,docker 就处于无响应状态;而 docker 18.06.3 做了一点小小的优化,在执行 docker ps 时去掉了针对容器级别的加锁操作,但是 docker inspect 依然会加容器锁,因此某一个容器出现问题,并不会造成 docker 服务不可响应,受影响的也仅仅是该容器,无法执行任何操作。 至于为什么以 docker ps 与 docker inspect 为指标检查 docker 状态,因为 kubelet 就是依赖这两个 docker API 获取容器状态。 所以,现在问题有二: docker hang 死的根因是什么? docker hang 死时,为什么重启 kubelet,会导致宿主状态变为 NotReady? 2. 重启 kubelet 变更宿主状态kubelet 重启后宿主状态从 Ready 变为 NotReady,这个问题相较 docker hang 死而言,没有那么复杂,所以我们先排查这个问题。 kubelet 针对宿主会设置多个 Condition,表明宿主当前所处的状态,比如宿主内存是否告急、线程数是否告急,以及宿主是否就绪。其中 ReadyCondition 表明宿主是否就绪,kubectl 查看宿主状态时,展示的 Statue 信息就是 ReadCondition 的内容,常见的状态及其含义定义如下: Ready 状态:表明当前宿主状态一切 OK,能正常响应 Pod 事件 NotReady 状态:表明宿主的 kubelet 仍在运行,但是此时已经无法处理 Pod 事件。NotReady 绝大多数情况都是容器运行时出了问题 Unknown 状态:表明宿主 kubelet 已停止运行 kubelet 定义的 ReadyCondition 的判定条件如下: // defaultNodeStatusFuncs is a factory that generates the default set of // setNodeStatus funcs func (kl *Kubelet) defaultNodeStatusFuncs() []func(*v1.Node) error { ...... setters = append(setters, nodestatus.OutOfDiskCondition(kl.clock.Now, kl.recordNodeStatusEvent), nodestatus.MemoryPressureCondition(kl.clock.Now, kl.evictionManager.IsUnderMemoryPressure, kl.recordNodeStatusEvent), nodestatus.DiskPressureCondition(kl.clock.Now, kl.evictionManager.IsUnderDiskPressure, kl.recordNodeStatusEvent), nodestatus.PIDPressureCondition(kl.clock.Now, kl.evictionManager.IsUnderPIDPressure, kl.recordNodeStatusEvent), nodestatus.ReadyCondition(kl.clock.Now, kl.runtimeState.runtimeErrors, kl.runtimeState.networkErrors, validateHostFunc, kl.containerManager.Status, kl.recordNodeStatusEvent), nodestatus.VolumesInUse(kl.volumeManager.ReconcilerStatesHasBeenSynced, kl.volumeManager.GetVolumesInUse), // TODO(mtaufen): I decided not to move this setter for now, since all it does is send an event // and record state back to the Kubelet runtime object. In the future, I'd like to isolate // these side-effects by decoupling the decisions to send events and partial status recording // from the Node setters. kl.recordNodeSchedulableEvent, ) return setters }深入 nodestatus.ReadyCondition 的实现可以发现,宿主是否 Ready 取决于很多条件,包含运行时判定、网络判定、基本资源判定等。这里我们只需关注运行时判定即可: func (s *runtimeState) runtimeErrors() []string { s.RLock() defer s.RUnlock() var ret []string if !s.lastBaseRuntimeSync.Add(s.baseRuntimeSyncThreshold).After(time.Now()) { // 1 ret = append(ret, "container runtime is down") } if s.internalError != nil { ret = append(ret, s.internalError.Error()) } for _, hc := range s.healthChecks { // 2 if ok, err := hc.fn(); !ok { ret = append(ret, fmt.Sprintf("%s is not healthy: %v", hc.name, err)) } } return ret }当出现如下两种状况之一时,则判定运行时检查不通过: 距最近一次运行时同步操作的时间间隔超过指定阈值(默认 30s) 运行时健康检查未通过 那么,当时宿主的 NotReady 是由哪种状况引起的呢?结合 kubelet 日志分析,kubelet 每隔 5s 就输出一条日志: ...... I0715 10:43:28.049240 16315 kubelet.go:1835] skipping pod synchronization - [container runtime is down] I0715 10:43:33.049359 16315 kubelet.go:1835] skipping pod synchronization - [container runtime is down] I0715 10:43:38.049492 16315 kubelet.go:1835] skipping pod synchronization - [container runtime is down] ......因此,状况 1 是宿主 NotReady 的元凶。 我们继续分析为什么 kubelet 没有按照预期设置 lastBaseRuntimeSync。kubelet 启动时会创建一个 goroutine,并在该 goroutine 中循环设置 lastBaseRuntimeSync,循环如下: func (kl *Kubelet) Run(updates kl.cadvisor.Start -> cc.Manager.Start -> self.createContainer -> m.createContainerLocked -> container.NewContainerHandler -> factory.CanHandleAndAccept -> self.client.ContainerInspect由于某个容器状态异常,kubelet 执行 docker inspect 操作也被 hang 死。 因此,重启 kubelet 引起宿主状态从 Ready 变为 NotReady,其根因在于某个容器状态异常,执行 docker inspect 时被 hang 死。而如果 docker inspect hang 死发生在 kubelet 重启之后,则不会对宿主的 Ready 状态造成任何影响,因为 oneTimeInitializer 是 sync.Once 类型,也即仅仅会在 kebelet 启动时执行一次。那时 kubelet 仅仅是不能处理该 Pod 相关的任何事件,包含删除、变更等,但是仍然能够处理其他 Pod 的任意事件。 可能有人会问,为什么 kubelet 重启时访问 docker inspect 操作不加超时控制?确实,如果添加了超时控制,kubelet 重启不会引起宿主状态变更。待详细挖掘后再来补充,我们先继续分析 docker hang 死的问题。 3. docker hang 死我们对 docker hang 死并不陌生,因为已经发生了好多起。其发生时的现象也多种多样。以往针对 docker 1.13.1 版本的排查都发现了一些线索,但是并没有定位到根因,最终绝大多数也是通过重启 docker 解决。而这一次发生在 docker 18.06.3 版本的 docker hang 死行为,经过我们 4 人小分队接近一周的望闻问切,终于确定了其病因。注意,docker hang 死的原因不止一种,因此本处方并非是个万能药。 现在,我们掌握的知识仅仅是 docker 异常了,无法响应特定容器的 docker inspect 操作,而对详细信息则一无所知。 链路跟踪首先,我们希望对 docker 运行的全局状况有一个大致的了解,熟悉 go 语言开发的用户自然能联想到神器 pprof。我们借助 pprof 描绘出了 docker 当时运行的蓝图: goroutine profile: total 722373 717594 @ 0x7fe8bc202980 0x7fe8bc202a40 0x7fe8bc2135d8 0x7fe8bc2132ef 0x7fe8bc238c1a 0x7fe8bd56f7fe 0x7fe8bd56f6bd 0x7fe8bcea8719 0x7fe8bcea938b 0x7fe8bcb726ca 0x7fe8bcb72b01 0x7fe8bc71c26b 0x7fe8bcb85f4a 0x7fe8bc4b9896 0x7fe8bc72a438 0x7fe8bcb849e2 0x7fe8bc4bc67e 0x7fe8bc4b88a3 0x7fe8bc230711 # 0x7fe8bc2132ee sync.runtime_SemacquireMutex+0x3e /usr/local/go/src/runtime/sema.go:71 # 0x7fe8bc238c19 sync.(*Mutex).Lock+0x109 /usr/local/go/src/sync/mutex.go:134 # 0x7fe8bd56f7fd github.com/docker/docker/daemon.(*Daemon).ContainerInspectCurrent+0x8d /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/inspect.go:40 # 0x7fe8bd56f6bc github.com/docker/docker/daemon.(*Daemon).ContainerInspect+0x11c /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/inspect.go:29 # 0x7fe8bcea8718 github.com/docker/docker/api/server/router/container.(*containerRouter).getContainersByName+0x118 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/inspect.go:15 # 0x7fe8bcea938a github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.getContainersByName)-fm+0x6a /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container.go:39 # 0x7fe8bcb726c9 github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1+0xd9 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/experimental.go:26 # 0x7fe8bcb72b00 github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1+0x400 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/version.go:62 # 0x7fe8bc71c26a github.com/docker/docker/pkg/authorization.(*Middleware).WrapHandler.func1+0x7aa /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/pkg/authorization/middleware.go:59 # 0x7fe8bcb85f49 github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1+0x199 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/server.go:141 # 0x7fe8bc4b9895 net/http.HandlerFunc.ServeHTTP+0x45 /usr/local/go/src/net/http/server.go:1947 # 0x7fe8bc72a437 github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP+0x227 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.go:103 # 0x7fe8bcb849e1 github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP+0x71 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router_swapper.go:29 # 0x7fe8bc4bc67d net/http.serverHandler.ServeHTTP+0xbd /usr/local/go/src/net/http/server.go:2694 # 0x7fe8bc4b88a2 net/http.(*conn).serve+0x652 /usr/local/go/src/net/http/server.go:1830 4175 @ 0x7fe8bc202980 0x7fe8bc202a40 0x7fe8bc2135d8 0x7fe8bc2132ef 0x7fe8bc238c1a 0x7fe8bcc2eccf 0x7fe8bd597af4 0x7fe8bcea2456 0x7fe8bcea956b 0x7fe8bcb73dff 0x7fe8bcb726ca 0x7fe8bcb72b01 0x7fe8bc71c26b 0x7fe8bcb85f4a 0x7fe8bc4b9896 0x7fe8bc72a438 0x7fe8bcb849e2 0x7fe8bc4bc67e 0x7fe8bc4b88a3 0x7fe8bc230711 # 0x7fe8bc2132ee sync.runtime_SemacquireMutex+0x3e /usr/local/go/src/runtime/sema.go:71 # 0x7fe8bc238c19 sync.(*Mutex).Lock+0x109 /usr/local/go/src/sync/mutex.go:134 # 0x7fe8bcc2ecce github.com/docker/docker/container.(*State).IsRunning+0x2e /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/container/state.go:240 # 0x7fe8bd597af3 github.com/docker/docker/daemon.(*Daemon).ContainerStats+0xb3 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/stats.go:30 # 0x7fe8bcea2455 github.com/docker/docker/api/server/router/container.(*containerRouter).getContainersStats+0x1e5 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container_routes.go:115 # 0x7fe8bcea956a github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.getContainersStats)-fm+0x6a /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container.go:42 # 0x7fe8bcb73dfe github.com/docker/docker/api/server/router.cancellableHandler.func1+0xce /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/local.go:92 # 0x7fe8bcb726c9 github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1+0xd9 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/experimental.go:26 # 0x7fe8bcb72b00 github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1+0x400 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/version.go:62 # 0x7fe8bc71c26a github.com/docker/docker/pkg/authorization.(*Middleware).WrapHandler.func1+0x7aa /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/pkg/authorization/middleware.go:59 # 0x7fe8bcb85f49 github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1+0x199 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/server.go:141 # 0x7fe8bc4b9895 net/http.HandlerFunc.ServeHTTP+0x45 /usr/local/go/src/net/http/server.go:1947 # 0x7fe8bc72a437 github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP+0x227 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.go:103 # 0x7fe8bcb849e1 github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP+0x71 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router_swapper.go:29 # 0x7fe8bc4bc67d net/http.serverHandler.ServeHTTP+0xbd /usr/local/go/src/net/http/server.go:2694 # 0x7fe8bc4b88a2 net/http.(*conn).serve+0x652 /usr/local/go/src/net/http/server.go:1830 1 @ 0x7fe8bc202980 0x7fe8bc202a40 0x7fe8bc2135d8 0x7fe8bc2131fb 0x7fe8bc239a3b 0x7fe8bcbb679d 0x7fe8bcc26774 0x7fe8bd570b20 0x7fe8bd56f81c 0x7fe8bd56f6bd 0x7fe8bcea8719 0x7fe8bcea938b 0x7fe8bcb726ca 0x7fe8bcb72b01 0x7fe8bc71c26b 0x7fe8bcb85f4a 0x7fe8bc4b9896 0x7fe8bc72a438 0x7fe8bcb849e2 0x7fe8bc4bc67e 0x7fe8bc4b88a3 0x7fe8bc230711 # 0x7fe8bc2131fa sync.runtime_Semacquire+0x3a /usr/local/go/src/runtime/sema.go:56 # 0x7fe8bc239a3a sync.(*RWMutex).RLock+0x4a /usr/local/go/src/sync/rwmutex.go:50 # 0x7fe8bcbb679c github.com/docker/docker/daemon/exec.(*Store).List+0x4c /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/exec/exec.go:140 # 0x7fe8bcc26773 github.com/docker/docker/container.(*Container).GetExecIDs+0x33 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/container/container.go:423 # 0x7fe8bd570b1f github.com/docker/docker/daemon.(*Daemon).getInspectData+0x5cf /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/inspect.go:178 # 0x7fe8bd56f81b github.com/docker/docker/daemon.(*Daemon).ContainerInspectCurrent+0xab /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/inspect.go:42 # 0x7fe8bd56f6bc github.com/docker/docker/daemon.(*Daemon).ContainerInspect+0x11c /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/inspect.go:29 # 0x7fe8bcea8718 github.com/docker/docker/api/server/router/container.(*containerRouter).getContainersByName+0x118 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/inspect.go:15 # 0x7fe8bcea938a github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.getContainersByName)-fm+0x6a /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container.go:39 # 0x7fe8bcb726c9 github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1+0xd9 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/experimental.go:26 # 0x7fe8bcb72b00 github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1+0x400 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/version.go:62 # 0x7fe8bc71c26a github.com/docker/docker/pkg/authorization.(*Middleware).WrapHandler.func1+0x7aa /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/pkg/authorization/middleware.go:59 # 0x7fe8bcb85f49 github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1+0x199 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/server.go:141 # 0x7fe8bc4b9895 net/http.HandlerFunc.ServeHTTP+0x45 /usr/local/go/src/net/http/server.go:1947 # 0x7fe8bc72a437 github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP+0x227 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.go:103 # 0x7fe8bcb849e1 github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP+0x71 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router_swapper.go:29 # 0x7fe8bc4bc67d net/http.serverHandler.ServeHTTP+0xbd /usr/local/go/src/net/http/server.go:2694 # 0x7fe8bc4b88a2 net/http.(*conn).serve+0x652 /usr/local/go/src/net/http/server.go:1830 1 @ 0x7fe8bc202980 0x7fe8bc212946 0x7fe8bc8b6881 0x7fe8bc8b699d 0x7fe8bc8e259b 0x7fe8bc8e1695 0x7fe8bc8c47d5 0x7fe8bd2e0c06 0x7fe8bd2eda96 0x7fe8bc8c42fb 0x7fe8bc8c4613 0x7fe8bd2a6474 0x7fe8bd2e6976 0x7fe8bd3661c5 0x7fe8bd56842f 0x7fe8bcea7bdb 0x7fe8bcea9f6b 0x7fe8bcb726ca 0x7fe8bcb72b01 0x7fe8bc71c26b 0x7fe8bcb85f4a 0x7fe8bc4b9896 0x7fe8bc72a438 0x7fe8bcb849e2 0x7fe8bc4bc67e 0x7fe8bc4b88a3 0x7fe8bc230711 # 0x7fe8bc8b6880 github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).waitOnHeader+0x100 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:222 # 0x7fe8bc8b699c github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).RecvCompress+0x2c /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:233 # 0x7fe8bc8e259a github.com/docker/docker/vendor/google.golang.org/grpc.(*csAttempt).recvMsg+0x63a /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:515 # 0x7fe8bc8e1694 github.com/docker/docker/vendor/google.golang.org/grpc.(*clientStream).RecvMsg+0x44 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:395 # 0x7fe8bc8c47d4 github.com/docker/docker/vendor/google.golang.org/grpc.invoke+0x184 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:83 # 0x7fe8bd2e0c05 github.com/docker/docker/vendor/github.com/containerd/containerd.namespaceInterceptor.unary+0xf5 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.go:35 # 0x7fe8bd2eda95 github.com/docker/docker/vendor/github.com/containerd/containerd.(namespaceInterceptor).(github.com/docker/docker/vendor/github.com/containerd/containerd.unary)-fm+0xf5 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.go:51 # 0x7fe8bc8c42fa github.com/docker/docker/vendor/google.golang.org/grpc.(*ClientConn).Invoke+0x10a /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:35 # 0x7fe8bc8c4612 github.com/docker/docker/vendor/google.golang.org/grpc.Invoke+0xc2 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:60 # 0x7fe8bd2a6473 github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1.(*tasksClient).Start+0xd3 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.go:421 # 0x7fe8bd2e6975 github.com/docker/docker/vendor/github.com/containerd/containerd.(*process).Start+0xf5 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/process.go:109 # 0x7fe8bd3661c4 github.com/docker/docker/libcontainerd.(*client).Exec+0x4b4 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/libcontainerd/client_daemon.go:381 # 0x7fe8bd56842e github.com/docker/docker/daemon.(*Daemon).ContainerExecStart+0xb4e /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/exec.go:251 # 0x7fe8bcea7bda github.com/docker/docker/api/server/router/container.(*containerRouter).postContainerExecStart+0x34a /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/exec.go:125 # 0x7fe8bcea9f6a github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.postContainerExecStart)-fm+0x6a /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container.go:59 # 0x7fe8bcb726c9 github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1+0xd9 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/experimental.go:26 # 0x7fe8bcb72b00 github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1+0x400 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/version.go:62 # 0x7fe8bc71c26a github.com/docker/docker/pkg/authorization.(*Middleware).WrapHandler.func1+0x7aa /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/pkg/authorization/middleware.go:59 # 0x7fe8bcb85f49 github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1+0x199 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/server.go:141 # 0x7fe8bc4b9895 net/http.HandlerFunc.ServeHTTP+0x45 /usr/local/go/src/net/http/server.go:1947 # 0x7fe8bc72a437 github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP+0x227 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.go:103 # 0x7fe8bcb849e1 github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP+0x71 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router_swapper.go:29 # 0x7fe8bc4bc67d net/http.serverHandler.ServeHTTP+0xbd /usr/local/go/src/net/http/server.go:2694 # 0x7fe8bc4b88a2 net/http.(*conn).serve+0x652 /usr/local/go/src/net/http/server.go:1830注意,这是一份精简后的 docker 协程栈信息。从上面的蓝图,我们可以总结出如下结论: 有 717594 个协程被阻塞在 docker inspect 有 4175 个协程被阻塞在 docker stats 有 1 个协程被阻塞在获取 docker exec 的任务 ID 有 1 个协程被阻塞在 docker exec 的执行过程 从上面的结论,我们基本了解了异常容器 hang 死的原因,在于该容器执行 docker exec 后未返回 (4),进而导致获取 docker exec 的任务 ID 阻塞(3),由于(3) 实现获取了容器锁,进而导致了 docker inspect (1)与 docker stats (2) 卡死。所以病因并非是 docker inspect,而是 docker exec。 要想继续往下挖掘,我们现在有必要补充一下背景知识。kubelet 启动容器或者在容器内执行命令的完整调用路径如下: +--------------------------------------------------------------+ | | | +------------+ | | | | | | | kubelet | | | | | | | +------|-----+ | | | | | | | | +------v-----+ +---------------+ | | | | | | | | | dockerd ------->| containerd | | | | | | | | | +------------+ +-------|-------+ | | | | | | | | +-------v-------+ +-----------+ | | | | | | | | |containerd-shim----->| runc | | | | | | | | | +---------------+ +-----------+ | | | +--------------------------------------------------------------+dockerd 与 containerd 可以当做两层 nginx 代理,containerd-shim 是容器的监护人,而 runc 则是容器启动与命令执行的真正工具人。runc 干的事情也非常简单:按照用户指定的配置创建 NS,或者进入特定 NS,然后执行用户命令。说白了,创建容器就是新建 NS,然后在该 NS 内执行用户指定的命令。 按照上面介绍的背景知识,我们继续往下探索 containerd。幸运的是,借助 pprof,我们也可以描绘出 containerd 此时的运行蓝图: goroutine profile: total 430 1 @ 0x7f6e55f82740 0x7f6e55f92616 0x7f6e56a8412c 0x7f6e56a83d6d 0x7f6e56a911bf 0x7f6e56ac6e3b 0x7f6e565093de 0x7f6e5650dd3b 0x7f6e5650392b 0x7f6e56b51216 0x7f6e564e5909 0x7f6e563ec76a 0x7f6e563f000a 0x7f6e563f6791 0x7f6e55fb0151 # 0x7f6e56a8412b github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).dispatch+0x24b /go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:102 # 0x7f6e56a83d6c github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).Call+0x15c /go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:73 # 0x7f6e56a911be github.com/containerd/containerd/linux/shim/v1.(*shimClient).Start+0xbe /go/src/github.com/containerd/containerd/linux/shim/v1/shim.pb.go:1745 # 0x7f6e56ac6e3a github.com/containerd/containerd/linux.(*Process).Start+0x8a /go/src/github.com/containerd/containerd/linux/process.go:125 # 0x7f6e565093dd github.com/containerd/containerd/services/tasks.(*local).Start+0x14d /go/src/github.com/containerd/containerd/services/tasks/local.go:187 # 0x7f6e5650dd3a github.com/containerd/containerd/services/tasks.(*service).Start+0x6a /go/src/github.com/containerd/containerd/services/tasks/service.go:72 # 0x7f6e5650392a github.com/containerd/containerd/api/services/tasks/v1._Tasks_Start_Handler.func1+0x8a /go/src/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.go:624 # 0x7f6e56b51215 github.com/containerd/containerd/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.UnaryServerInterceptor+0xa5 /go/src/github.com/containerd/containerd/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/server.go:29 # 0x7f6e564e5908 github.com/containerd/containerd/api/services/tasks/v1._Tasks_Start_Handler+0x168 /go/src/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.go:626 # 0x7f6e563ec769 github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).processUnaryRPC+0x849 /go/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:920 # 0x7f6e563f0009 github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).handleStream+0x1319 /go/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:1142 # 0x7f6e563f6790 github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1+0xa0 /go/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:637同样,我们仅保留了关键的协程信息,从上面的协程栈可以看出,containerd 阻塞在接收 exec 返回结果处,附上关键代码佐证: func (c *Client) dispatch(ctx context.Context, req *Request, resp *Response) error { errs := make(chan error, 1) call := &callRequest{ req: req, resp: resp, errs: errs, } select { case c.calls |
今日新闻 |
推荐新闻 |
CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3 |